最近利用多模式数据旨在建立面部动作单元(AU)检测模型的研究。但是,由于多模式数据的异质性,多模式表示学习成为主要挑战之一。一方面,很难通过仅通过一个特征提取器从多模式中提取相关特征,另一方面,先前的研究并未完全探索多模式融合策略的潜力。例如,早期融合通常需要在推理期间存在所有方式,而晚期融合和中间融合则增加了特征学习的网络大小。与晚期融合的大量工作相反,早期融合探索渠道信息的作品很少。本文提出了一个新型的多模式网络,称为多模式通道混合(MCM),作为一种预训练的模型,以学习强大的表示形式,以促进多模式融合。我们在自动面部动作单元检测的下游任务上评估学习的表示形式。具体而言,它是一个单个流编码器网络,该网络在早期融合中使用频道混合模块,在下游检测任务中仅需要一种模态。我们还利用蒙版的VIT编码器从融合图像中学习特征,并使用两个VIT解码器重建两个模式。我们已经在两个公共数据集(称为BP4D和DISFA)上进行了广泛的实验,以评估所提出的多模式框架的有效性和鲁棒性。结果表明我们的方法是可比或优越的,它与最新的基线方法相当。
translated by 谷歌翻译
训练后量化(PTQ)由于其在部署量化的神经网络方面的便利性而引起了越来越多的关注。 Founding是量化误差的主要来源,仅针对模型权重进行了优化,而激活仍然使用圆形至最终操作。在这项工作中,我们首次证明了精心选择的激活圆形方案可以提高最终准确性。为了应对激活舍入方案动态性的挑战,我们通过简单的功能适应圆形边框,以在推理阶段生成圆形方案。边界函数涵盖了重量误差,激活错误和传播误差的影响,以消除元素误差的偏差,从而进一步受益于模型的准确性。我们还使边境意识到全局错误,以更好地拟合不同的到达激活。最后,我们建议使用Aquant框架来学习边界功能。广泛的实验表明,与最先进的作品相比,Aquant可以通过可忽略不计的开销来取得明显的改进,并将Resnet-18的精度提高到2位重量和激活后训练后量化下的精度最高60.3 \%。
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译
学习协作对于多机构增强学习(MARL)至关重要。以前的作品通过最大化代理行为的相关性来促进协作,该行为的相关性通常以不同形式的相互信息(MI)为特征。但是,我们揭示了次最佳的协作行为,也出现了强烈的相关性,并且简单地最大化MI可以阻碍学习的学习能力。为了解决这个问题,我们提出了一个新颖的MARL框架,称为“渐进式信息协作(PMIC)”,以进行更有效的MI驱动协作。 PMIC使用全球国家和联合行动之间MI测量的新协作标准。基于此标准,PMIC的关键思想是最大程度地提高与优越的协作行为相关的MI,并最大程度地减少与下等方面相关的MI。这两个MI目标通过促进更好的合作,同时避免陷入次级优势,从而扮演互补的角色。与其他算法相比,在各种MARL基准测试的实验表明,PMIC的表现出色。
translated by 谷歌翻译
迷你竞赛旨在开发强化学习和模仿学习算法,可以有效地利用人类演示,大大减少了解复杂\ emph {获取德国}任务以稀疏奖励所需的环境交互的数量。为了解决挑战,在本文中,我们呈现\ textbf {seihai},a \ textbf {s} ample-\ textbf {e} ff \ textbf {e} ff \ textbf {i} cient \ textbf {h} ierrampf {h} ierraschical \ textbf {ai},充分利用人类示范和任务结构。具体而言,我们将任务分成几个顺序相关的子任务,并使用强化学习和模仿学习培训每个子任务的合适代理。我们进一步设计了一个调度程序,为自动为不同的子任务选择不同的代理。Seihai在Neurips-2020 Minerl竞赛中初步和最终的第一名。
translated by 谷歌翻译
神经体系结构搜索(NAS)的主要挑战之一是有效地对体系结构的性能进行排名。绩效排名者的主流评估使用排名相关性(例如,肯德尔的tau),这对整个空间都同样关注。但是,NAS的优化目标是识别顶级体系结构,同时对搜索空间中其他体系结构的关注更少。在本文中,我们从经验和理论上都表明,标准化的累积累积增益(NDCG)对于排名者来说是一个更好的指标。随后,我们提出了一种新算法Acenas,该算法直接通过Lambdarank优化NDCG。它还利用体重共享NAS产生的弱标签来预先培训排名,以便进一步降低搜索成本。对12个NAS基准和大规模搜索空间进行的广泛实验表明,我们的方法始终超过SOTA NAS方法,精度提高了3.67%,搜索成本降低了8倍。
translated by 谷歌翻译
Ramp merging is a typical application of cooperative intelligent transportation system (C-ITS). Vehicle trajectories perceived by roadside sensors are importation complement to the limited visual field of on-board perception. Vehicle tracking and trajectory denoising algorithm is proposed in this paper to take full advantage of roadside cameras for vehicle trajectory and speed profile estimation. Dynamic speed guidance algorithm is proposed to help on-ramp vehicles to merge into mainline smoothly, even in non-cooperative environment where mainline vehicles are not expected to slow down to accommodate on-ramp vehicles. On-site experiments were taken out in a merging area of Hangzhou Belt Highway to testify our prototype system, and simulation analysis shows our proposed algorithm can achieve significant fuel savings during the ramp merging process.
translated by 谷歌翻译
Most regularized tensor regression research focuses on tensors predictors with scalars responses or vectors predictors to tensors responses. We consider the sparse low rank tensor on tensor regression where predictors $\mathcal{X}$ and responses $\mathcal{Y}$ are both high-dimensional tensors. By demonstrating that the general inner product or the contracted product on a unit rank tensor can be decomposed into standard inner products and outer products, the problem can be simply transformed into a tensor to scalar regression followed by a tensor decomposition. So we propose a fast solution based on stagewise search composed by contraction part and generation part which are optimized alternatively. We successfully demonstrate our method can out perform current methods in terms of accuracy, predictors selection by effectively incorporating the structural information.
translated by 谷歌翻译
While pre-trained Chinese language models have demonstrated impressive performance on a wide range of NLP tasks, the Chinese Spell Checking (CSC) task remains a challenge. Previous research has explored using information such as glyphs and phonetics to improve the ability to distinguish misspelled characters, with good results. However, the generalization ability of these models is not well understood: it is unclear whether they incorporate glyph-phonetic information and, if so, whether this information is fully utilized. In this paper, we aim to better understand the role of glyph-phonetic information in the CSC task and suggest directions for improvement. Additionally, we propose a new, more challenging, and practical setting for testing the generalizability of CSC models. All code is made publicly available.
translated by 谷歌翻译
We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description. The discrete time step evolution of variables in RDDL is described by conditional probability functions, which fits naturally into the Gym step scheme. Furthermore, since RDDL is a lifted description, the modification and scaling up of environments to support multiple entities and different configurations becomes trivial rather than a tedious process prone to errors. We hope that pyRDDLGym will serve as a new wind in the reinforcement learning community by enabling easy and rapid development of benchmarks due to the unique expressive power of RDDL. By providing explicit access to the model in the RDDL description, pyRDDLGym can also facilitate research on hybrid approaches for learning from interaction while leveraging model knowledge. We present the design and built-in examples of pyRDDLGym, and the additions made to the RDDL language that were incorporated into the framework.
translated by 谷歌翻译